budget constraint
Overcoming the Incentive Collapse Paradox
Yin, Qichuan, Su, Ziwei, Li, Shuangning
AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer from incentive collapse: as AI accuracy improves, sustaining positive human effort requires unbounded payments. We study this problem in a budget-constrained principal-agent framework with strategic human agents whose output accuracy depends on unobserved effort. We propose a sentinel-auditing payment mechanism that enforces a strictly positive and controllable level of human effort at finite cost, independent of AI accuracy. Building on this incentive-robust foundation, we develop an incentive-aware active statistical inference framework that jointly optimizes (i) the auditing rate and (ii) active sampling and budget allocation across tasks of varying difficulty to minimize the final statistical loss under a single budget. Experiments demonstrate improved cost-error tradeoffs relative to standard active learning and auditing-only baselines.
Off-Policy Evaluation and Learning for Survival Outcomes under Censoring
Kubota, Kohsuke, Takahashi, Mitsuhiro, Saito, Yuta
Optimizing survival outcomes, such as patient survival or customer retention, is a critical objective in data-driven decision-making. Off-Policy Evaluation~(OPE) provides a powerful framework for assessing such decision-making policies using logged data alone, without the need for costly or risky online experiments in high-stakes applications. However, typical estimators are not designed to handle right-censored survival outcomes, as they ignore unobserved survival times beyond the censoring time, leading to systematic underestimation of the true policy performance. To address this issue, we propose a novel framework for OPE and Off-Policy Learning~(OPL) tailored for survival outcomes under censoring. Specifically, we introduce IPCW-IPS and IPCW-DR, which employ the Inverse Probability of Censoring Weighting technique to explicitly deal with censoring bias. We theoretically establish that our estimators are unbiased and that IPCW-DR achieves double robustness, ensuring consistency if either the propensity score or the outcome model is correct. Furthermore, we extend this framework to constrained OPL to optimize policy value under budget constraints. We demonstrate the effectiveness of our proposed methods through simulation studies and illustrate their practical impacts using public real-world data for both evaluation and learning tasks.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > United Kingdom > England > Essex > Colchester (0.04)
- Europe > Italy > Sardinia (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
c50c42f853db0f1f5b4195358b6d97de-Supplemental-Conference.pdf
Let us imagine that the grand coalition is formed by one party joining the coalition at a time. Given an order of parties (i.e., a permutationπ of N), party i joins the coalitionPiπ which denotes all parties precedingi in π. It is well-known that the Shapley value, despite its fairness, is not replication robustness in data valuation [1]. This is because the two desirable properties for fairness: symmetry and efficiency violate the replication robustness. In this work, we are interested in maintaining both the efficiency and the symmetry properties of an allocation scheme. Let us consider the case that in the grand coalitionN+, there exists a partyi+ N that is a replication of another party i N \i+ (i.e., Di = Di+).
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > Queensland (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
Labels or Preferences? Budget-Constrained Learning with Human Judgments over AI-Generated Outputs
Dong, Zihan, Wu, Ruijia, Zhang, Linjun
The increasing reliance on human preference feedback to judge AI-generated pseudo labels has created a pressing need for principled, budget-conscious data acquisition strategies. We address the crucial question of how to optimally allocate a fixed annotation budget between ground-truth labels and pairwise preferences in AI. Our solution, grounded in semi-parametric inference, casts the budget allocation problem as a monotone missing data framework. Building on this formulation, we introduce Preference-Calibrated Active Learning (PCAL), a novel method that learns the optimal data acquisition strategy and develops a statistically efficient estimator for functionals of the data distribution. Theoretically, we prove the asymptotic optimality of our PCAL estimator and establish a key robustness guarantee that ensures robust performance even with poorly estimated nuisance models. Our flexible framework applies to a general class of problems, by directly optimizing the estimator's variance instead of requiring a closed-form solution. This work provides a principled and statistically efficient approach for budget-constrained learning in modern AI. Simulations and real-data analysis demonstrate the practical benefits and superior performance of our proposed method.
Efficient Prompt Optimization Through the Lens of Best Arm Identification
The remarkable instruction-following capability of large language models (LLMs) has sparked a growing interest in automatically finding good prompts, i.e., prompt optimization. Most existing works follow the scheme of selecting from a pre-generated pool of candidate prompts. However, these designs mainly focus on the generation strategy, while limited attention has been paid to the selection method. Especially, the cost incurred during the selection (e.g., accessing LLM and evaluating the responses) is rarely explicitly considered. To overcome this limitation, this work provides a principled framework, TRIPLE, to efficiently perform prompt selection under an explicit budget constraint. TRIPLE is built on a novel connection established between prompt optimization and fixed-budget best arm identification (BAI-FB) in multi-armed bandits (MAB); thus, it is capable of leveraging the rich toolbox from BAI-FB systematically and also incorporating unique characteristics of prompt optimization. Extensive experiments on multiple well-adopted tasks using various LLMs demonstrate the remarkable performance improvement of TRIPLE over baselines while satisfying the limited budget constraints. As an extension, variants of TRIPLE are proposed to efficiently select examples for few-shot prompts, also achieving superior empirical performance.